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"*^ Abstract 

o 

^^ We first prove that, in the limit, even very modestly correlated Erdos-Renyi graphs are 

'~^ correctly alignable through graph matching. Unfortunately, there are no efficient algorithms 

/^ known for graph matching (even deciding if two graphs are isomorphic is notoriously of 

f-i unknown complexity), and therefore graph matching will not directly, and by itself, provide 

C^ for efficient graph alignment. However, we prove (under mild conditions) that if there are 

e known seeds (..e. a partial alignment), then a logarithmie nomber of such known seeds are 

necessary and sufficient for an efficient linear assignment problem formulation "CNS" to 

J> almost always give a correct alignment. 

■^ Efficient Frank- Wolfe methodology has been promulgated in the literature for use in ob- 

i^ taining approximate graph matching, and we demonstrate in this paper that Frank- Wolfe 

■^ methodology's natural extension to incorporate seeds inherently includes a linear assignment 

p^ problem step which has the CNS formulation directly embedded in it. We then illustrate via 

^^ simulation experiments that, when there are few seeds, Frank- Wolfe methodology can per- 

.J^ form substantially better than CNS alone; indeed, the Frank- Wolfe methodology essentially 

rS incorporates CNS as well as a non-seed-based approach. 

d 



1 Background and overview 

The graph matching problem — i.e. to find a bijection between the vertices of two graphs which 
minimizes the number of edge disagreements between the graphs — has a rich and active place in 
the literature, with applications in such diverse fields as document processing, video and image 
analysis, pattern recognition and machine vision, to name just a few. However, there are no efficient 



algorithms known for solving graph matching exactly. Even the easier problem of just deciding if 
two graphs are isomorphic is notoriously of unknown complexity [T],[l0]. Indeed, graph matching 
is a special case of the NP-hard quadratic assignment problem, and if the graphs are allowed to 
be directed, loopy, and weighted, then graph matching is actually equivalent to the quadratic 
assignment problem. Because of its practical applicability, there is a vast amount of literature 
devoted to approximate graph matching algorithms; for a survey of the existing literature, see e.g. 
the Conte et al. paper titled "Thirty Years of Graph Matching in Pattern Recognition" jl]. 

An important class of approximate graph matching algorithms utilize a Frank- Wolfe approach; 
the idea is more formally described later in Section |4j To briefly describe here, such methods re- 
lax an integer programming formulation of graph matching to obtain a continuous problem, then 
perform an iterative procedure in which a linearization about the current iterate is optimized, 
and the next iterate comes from a line search between the current iterate and the linearization 
optimum. At the conclusion of the iterative procedure, the final iterate is projected to the nearest 
integer- valued point which is feasible as a graph match, and this is taken as the approximate graph 
matching solution. It turns out that the linear optimization done in each iteration can be formu- 
lated as a linear assignment problem, which can be solved efficiently. Frank- Wolfe methodology 
and variants are explored in [6], [13], and [2]. 

We now give an overview of the structure of this paper, and its contributions. 

In the presence of an inherent, underlying alignment function between the vertices of two 
graphs, it is natural to ask how well graph matching would mirror this underlying alignment. In 



Section 2^ we describe the correlated Erdos-Renyi random graph model, which provides us with 
a useful and natural setting to explore this question. The correlated Erdos-Renyi random graph 
model consists of two Erdos-Renyi random graphs which share a common vertex set and a common 
Bernoulli-trial probability parameter; for each pair of vertices, there is a given correlation between 
the two vertices' adjacency in one graph and the two vertices' adjacency in the other graph. In this 
manner, there is a natural alignment between the two graphs, and we can then explore whether 
or not graph matching the two graphs will reproduce this alignment. 



The first of our main results is Theorem [T| which we state in Section 2^ and prove in Section [3j 



For correlated Erdos-Renyi random graphs, under mild assumptions. Theorem [T] establishes that 
even very modest correlation is sufficient (in the limit) for graph matching to reproduce the correct 
alignment, while if the correlation is small enough then graph matching will not produce the correct 
alignment. 



Then, in Section |2.3| we discuss the seeded graph matching problem. This is a graph matching 
problem for which part of the bijection between the two graphs' vertices is prespecified and fixed, 
and we seek to complete the bijection so as to minimize the number of edge disagreements between 



the graphs. Discussion of seeded graph matching can be found in [6]. As done in [6], we describe 
in Section |4] how Frank- Wolfe Methodology for approximate graph matching is naturally extended 
to the setting of seeded graph matching. 

Another of our main results is Theorem [2| which we state in Section [2!4l and prove in Section [3j 
For correlated Erdos-Renyi graphs, under mild assumptions, Theorem |2] asserts that a logarith- 
mic number of seeds is necessary and sufficient for successful alignment (in the limit) when using 



the Common Neighboring Seeds "CNS" procedure; in Section 4.3 we observe that this CNS pro- 
cedure turns out to be naturally embedded in approximate seeded graph matching Frank- Wolfe 
methodology. 

2 Graph matching, random graph model, and main results 

In this paper, all graphs will be simple graphs; in particular, edges are undirected, there are no 
edges with a common vertex for both endpoints, and there are no multiple edges between any pair 
of vertices. If G is a graph, we will denote the vertex set of G as V{G) and, for any w, t>' G V{G), 
if V and v' are adjacent in G then this will be denoted v ~g v\ and if v and v' are not adjacent 
in G then this will be denoted v ^g ^'- For aiiy finite set V ^ the symbol (2) will denote all of the 
(2) unordered pairs of distinct elements from V . 

2.1 The graph matching problem 

We now describe the graph matching problem. Suppose G\ and G2 are graphs with same number 
of vertices. Let H denote the set of bijections V[G\) — )■ ViCj'i). For any ip & 11 , the number 
of adjacency disagreements induced by tp, which will be denoted A (■?/'), is the number of vertex 
pairs {v,v'} G ( 2 ) such that [v ^^i "v' and ip{v) ^02 '^('^')] o^ b T^Ci ^' and 4>{v) ^02 ^(^')]- 
The graph matching problem is to find a bijection in U that minimizes the number of induced 
edge disagreements; we will denote \E' := argmin^g/f A (■?/'). Equivalently stated, if n := |V^(G'i)| = 
|l^(G2)|5 and ii A,B & {0, 1}"-^" are respectively the adjacency matrices for Gi and G2, then the 
graph matching problem is to minimize \\A — PBP^\\p over all n-by-n permutation matrices P, 
where || ■ ||f is Frobenius matrix norm. 

There are no efficient algorithms known for graph matching. Even the easier problem of just 
deciding if Gi is isomorphic to G2 (i.e. deciding if there is a bijection V{Gi) — t- V{G2) which 
does not induce any edge disagreements) is of unknown complexity [3 [10], and is a candidate for 
being in an intermediate class strictly between P and NP-complete (if Pt^NP). Also, the problem 
of minimizing \\A — PBP^\\p over all n-by-n permutation matrices P, where A and B are any 



real-valued matrices, is equivalent to the NP-hard quadratic assignment problem. 

There are numerous approximate graph matching algorithms in the literature. One such algo- 
rithm is the FAQ algorithm of Vogelstein et al. [12] utilizing the Frank- Wolfe Method, which we 
describe later in Section |4j The algorithm's performance is empirically shown to be state-of-the- 
art on many benchmark problems, and when a fixed constant number of Frank- Wolfe iterations 
are performed, the running time of FAQ is O(n^), where n is the number of vertices to be matched. 
Moreover, if 100 < |y(G'i)| and Gi is selected with a discrete-uniform distribution (i.e. all pos- 
sible graphs on V{Gi) are equally likely) and G2 is an isomorphic copy of Gi with ^(6*2) being 
a discrete-uniform random permutation of V{Gi), then the probability that FAQ (with, say, 20 
Frank- Wolfe iterations allowed) yields the correct isomorphism is empirically observed to be very 
nearly 1. We choose to focus on the FAQ algorithm here because it is the simplest algorithm 
utilizing the Frank- Wolfe methodology while also achieving excellent performance on many of the 
QAP benchmark problems, see [12]. The SGM algorithm we consider herein is a natural extension 
of the FAQ problem to seeded graph matching. 

2.2 Correlated Erdos-Renyi graphs 

Presently, we describe the correlated Erdos-Renyi random graph model; this model will provide a 
theoretical framework within which we will prove our main theorems. Theorem [l] and Theorem [2j 

The model parameters are a positive integer n, a real number p in the interval (0, 1), and a real 
number q in the interval [0, 1]; these parameters completely specify the distribution of the model. 
There is an underlying vertex set V of cardinality n which is common to two graphs; call these 
graphs Gi and 6*2- For each z = 1,2 and each pair of vertices {f,f'} £ (2)) l^t 1{v,v'},Gi denote 
the indicator random variable for the adjacency of v and v' in Gi. For each z = 1, 2 and each pair 
of vertices {v,v'} G (2), the random variable !{„,„'}, g^ is Bernoulh(p) distributed, and they are all 
collectively independent except that, for each pair of vertices {f , v'} G (2), the variables 1{^_^/}^Gi 
and 1{^^„/},G2 have Pearson product-moment correlation coefficient q. After Gi and G2 are thus 
realized, their vertices are (separately) arbitrarily relabeled, so that we don't directly observe the 
alignment bijection $ : V{Gi) — )■ V{G2) wherein, for all v G V{Gi), the vertices v and ^{v) were 
corresponding vertices across the graphs before the relabeling (i.e., the same element of V). 

If we graph match Gi to G2, will graph match recover the alignment bijection, i.e., will \E' = 
{$}? The following Theorem is our first main result. We will be considering a sequence of random 
correlated Erdos-Renyi graphs with n = 1, then n = 2, then n = 3 . . ., and the parameters p and 
g are each functions of n. In this paper, when we say a sequence of events holds almost always, 
we mean that, with probability 1, all but a finite number of the events hold. 



Theorem 1. Suppose there exists a fixed real number .^i < 1 such that p < ^i. Then there exists 
fixed positive real numbers ci, 02,03,04 (depending only on the value of C,i) such that: 
i) If Q> ci\r^ «^^ P > '^2^^ then almost always ^ = {$}, and 



%%] 



Ifg< CsJ'-^ andp> C4^ then lim„^ooE| {^ e H : A{^) < A($)} | = 00. 



Because there is no known efficient algorithm for graph matching, this theorem does not directly 
provide a practical means of computing the alignment function. But it does hold out the hope 
that a good graph matching heuristic might be effective in approximating the alignment function 
for various classes of graphs. 

It will be useful for us later in Section [3] to observe an equivalent way to formulate correlated 
Erdos-Renyi graphs. For all pairs of vertices {v, v'} G (2), the indicator random variables 1{^_^/}^Gi 
are independently distributed Bernoulli(p) and then (independently for the different pairs v,v'), 
conditioning on l{vy},Gi = 1 we let l{t,y}_G'2 be distributed Bernoulli(p+f)(l— p)) and, conditioning 
on l{vy},Gi — 0; *^ 1^^ 1{d,d'},G2 ^e distributed Bernoulli(p(l — g)). It is an easy exercise to verify 
that as such, for each {v,v'} G (2); it holds that 1{v,v'},G2 i^ distributed Bernoulli(p), and that 
the correlation of 1{v,v'},Gi ^^^ l{yy}^G2 is g, as desired. 

2.3 Seeded graph matching, the CNS algorithm 



Continuing with the setting from Section 2.1 , suppose that we are also given a subset Ui C V{Gi) 
of seeds and an injective seeding function (p : Ui —> V{G2), say that U2 C V{G2) is the image of 0. 
Let U^ denote the set of bijections ip '■ ^(^i) ~^ ^(^2) such that for all u E Ui it holds that 
ip{u) = (piu). As before, for any bijection ip G iT^, the number of adjacency disagreements induced 
by ip, which will be denoted A(^), is the number of vertex pairs {v,v'} G ( 2 ) such that 
[v ~Gi v' and ip{v) 7^02 ^("^O] o^ b 7^Gi 'v' and ip{v) ~g2 i^i^')]. The seeded graph matching 
problem is to find a bijection in U^ that minimizes the number of induced edge disagreements; 
as before, we will denote \1/ := argmin^gTj^ A(?/;). Equivalently stated, suppose without loss of 
generality that Ui = U2 = {vi,V2, ■ ■ ■ ,Vs}, and that for all j = 1,2, ...,s, 4>{vj) = Vj] with A 
and B denoting the adjacency matrices for Gi and G2 respectively, the seeded graph matching 
problem is to minimize \\A — {I (B P) B {I (B P)'^ \\ f over all m-hj-m permutation matrices P, where 
m := |l^(G'i)| — s, and © is the direct sum, and / is the s-by-s identity matrix. 



In Section 4.2 we describe the approximate seeded graph matching algorithm (abbreviated 
SGM) from jiBj, which is precisely the approach/methodology of the FAQ algorithm of |12| when 
naturally extended to include the situation where there are seeds. The running time of SGM is 
0{n^), where n is the total number of vertices in each graph. 



We now present the Common Neighboring Seeds algorithm "CNS" for approximate seeded 



graph matching; note that we will demonstrate later in Section 4.3 that the CNS algorithm is 
naturally embedded in the SGM algorithm. Let Wi := V{Gi)\Ui denote the nonseeds in V{Gi). 
For any ip G U^, let f{ip) denote the number of pairs {w,u) G Wi x Ui such that both w ~Gi u 
and also ipi'^) ~G2 i'i.u). Denote \E'cns := argmax^gTj^ /(■?/'). Finding a member of \E'cns can 
be done in 0{n^) time (where n = \V{Gi)\), as we next describe, and this member of \E'cns is 
the output of the CNS algorithm and is taken as an approximate solution to the seeded graph 
matching problem. (I.e., ^E'cns ~ ^0 Specifically, if the adjacency matrices for Gi and G2 are 
respectively partitioned as A = [ "^ "^ ] and B = [ ^ ^a ] where A,B G Rl^il^l'^il each represent the 
adjacencies between the nonseed vertices and the seed vertices (and the seed vertices are ordered 
in A conformally to B), then finding a member of ^E'cns is done by maximizing trace(P^^i3"^) 
over all \Wi\ x \Wi\ permutation matrices P, which is clearly a linear assignment problem and can 
be solved in 0(|14^ip) time with the Hungarian Algorithm [5l|9]. 

2.4 Seeded, correlated Erdos-Renyi graphs, 

Seeded, correlated Erdos-Renyi graphs are correlated Erdos-Renyi graphs Gi and G2 where part 
of the alignment function is observed; specifically, there is a subset of seeds f/i C V{Gi) such that 
$ is known on Ui . If we take (p to be the restriction of $ to f/i and we perform approximate seeded 
graph matching using CNS, we may hope that ^E'cns = {^}; if this is true then we are provided 
an efficient means of computing the alignment function. 

The next theorem is another of our main results. We will be considering a sequence of random 
correlated Erdos-Renyi graphs where the number of nonseed vertices is m = 1, then m = 2, then 
m = 3 . . ., and the number of seeds s is a function of m. 

Theorem 2. Suppose there exists a fixed real number ^2 > such that .^2 < P < 1 — ^2 o,nd 
,^2 < f? < 1 — ■^2- Then there exists fixed real numbers C5, ce > (depending only on C,2) such that: 
i) If s > Cslogm then almost always \&cns = {^}, ond 
a) If s < celogm then lim^^ooE|{^ e 11^ : f{ip) > /($)}| = 00. 

Note that a member of \&cns is efficiently computable, and thus Theorem [2] (unlike Theorem IT]) 
directly provides a means to efficiently recover the alignment bijection $, if there are enough seeds. 



3 Proof of main results, Theorem |T] and Theorem |2 



Theorem [T] is proved in Sections |3. 2 , |3.3 and 3^, and these three subsections are a continuation 



one of the other. Theorem [2] is proved in Sections 3^, 3^, and 3/7 and these three subsections are a 



continuation one of the other. The underlying methodology for proving Theorem [T] is very similar 
to the methodology for proving Theorem [2] We begin with some results that will subsequently be 
used in the proof of Theorems [T] and |2} 

3.1 Supporting results 

The next result, Theorem [Sl is from [1], in the form found in [8]. 



Theorem 3. Suppose random variable X is a function ofn independent Bernoulli{q) random vari- 
ables such that changing the value of any one of the Bernoulli random variables changes the value 
of X by at most 2. For any t < \/nq{l — q), we have P |X — EX| > 4t-y/ng(l — q) < 2e~* . 

The next result, Theorem |4l is a Chernoff-Hoeffding bound which is Theorem 3.2 in [3]. 

Theorem 4. Suppose X has a Binomial{n, q) distribution. Then for all real numbers t it holds 

that P [X - EX > t] < e2"'J+2*/3 . 

For any r.,q & (0, 1), define -ff (r, q) := r log ( - ) + (1 — r) log ( j5^ j . This is the Kullback-Leibler 
divergence between binomial random variables with respective success probabilities r and q. We 
will later use the following rough lower bound estimate of a binomial tail probability: 

Proposition 5. Suppose X has a Binomial(n, q) distribution, and suppose that 0<g<r<l — - 



for a real number r. Then F{X > nr) > ^ ■ J^^n-'^/'^q ■ e-"-^^'^'^). 

Proof of Proposition [5| We compute and bound 

P(X>nr) > P(X= rnrl)= f/,^grnrl/i_^)n-Kl 

\\nr\J 






5 






where the inequality in the second display line follows from Stirling's formula. Now, 

\ nr+0.5 

nr / 

-. \ nr+0.5 -, 



Combining the above, we obtain 



nx>nr) > ^■ J,",, n-'^'r-\i-qr 



ri/2(i_r)i/2 •^ ^ '^^ (nr)"'^(n-nr) 



_![. jl:iIn-i/2g.e-"^(^''') 



as desired. D 



3.2 Overall argument of the proof for Theorem [T], part i) 

It is notationally convenient to assume without loss of generahty that the correlated Erdos-Renyi 
graphs Gi and G2 are on the same set of n vertices V and we do not relabel the vertices. Let U 
denote the set of bijections V ^ V; here, the identity function e G i7 is the alignment bijection $. 
For any ip & U, 

A+{Gi,G2,ip) := I {{v,v'} e (2) such that v t^Gi V and ipiv) ~g2 i^{v')} |, 
A^{Gi,G2,ip) := I {{v,v'} G (^) such that v ~Gi v' and ip{v) 7^g2 '^i'^')} \j 

A0+(Gi,G2,V^) := I {{v,v'} G (^) such that v t^Gi ^^' and ^{v) ~Gi ^K) and ^{v) ^g^ ^K)} I, 
A*^^(G'i, 6*2 5 V") := I {{'^j'^'} ^ (2) ^^'^^ that v ~Gi "w' and ^{v) t^Gi V^('y') and ■?/'(f) ~g2 V^l"^')} 1; 
and A(Gi,G'2,V^) := A+(Gi, ^2,^) + A-(Gi, G2, V')- 
First, note that 

A+(Gi,Gi,^) = A-(Gi,Gi,^) = ^A(Gi,Gi,V^) ; (1) 

this is because the number of edges in Gi isn't changed when its vertices are permuted by ip. 
Next, note that 

A(Gi, G2, ^) - A(Gi, G2, e) = A(Gi, G^, ^) - 2 ■ AO+(Gi, G2, V^) - 2 ■ AO-(Gi, ^2, V^) ; (2) 

this is easily verified by replacing "6*2" in ^ by "G" , and observing the truth of ^ as G, starting 
out with G = Gi, is changed one edge-flip at a time until G = G2. 

Now, consider the event, which we shall call T, that for all ip G i7\{e}, 

A°+(Gi,G2,^) < A+(Gi,Gi,^)- ((l-j9)(l-^) + |) andalso (3) 

A0-(Gi,G2,^) < A-(Gi,Gi,^)-(p(l-f?) + |). (4) 

We will next show in Section 3^ that, under the hypotheses of the first part of Theorem [1} 



T almost always happens (in other words, with probability 1, T happens for all but a finite 
numbers of n's). Then, adding dsl) to Q and using ([I]), we then obtain that almost always 
A°+(Gi,G2,^) + A0-(Gi,G2,^) < I ■ A(Gi,Gi,^) for all tp G i7\{e}. Substituting this into g 
yields that almost always A{Gi,G2,ip) > A{Gi,G2,e) for all ip G i7\{e}, and the first part of 
Theorem [1] is then proven. 



3.3 Under hypotheses of Theorem [T], part i), T occurs almost always 

For any k G {1,2, ... ,n}, let n{k) denote the set of bijections in U such that the number of non- 
fixed-points of the bijection is exactly k; that is, n{k) := {ip & U : \{v & V : ip{v) "^ v}\ = k}. A 
simple upper bound for \n(k)\ is \n(k)\ < ()!)A;! = n{n — l)(n — 2) ■ ■ ■ (n — k + 1) < n^ . 

Just for now, let k G {1,2, . . . ,n} be chosen, and let ip G n{k) be chosen. Denoting T{ip) : = 
{{f,f'} G (2) such that v = ip{v'), v' = ip{v)}^, we have that the random variable A(Gi,Gi,ip) 
is a function of the n := (2) + {n — k)k — |T(^)| independent Bernoulli(p) random variables 
{l{i,,,;'},Gi}{„y}e(^)\T{V') : ^{v)^v or i,{v')^v'^ ^'^'^ "^^^^ ^^^^ ^^c hypotheses of Theoremlslare satisfied, 
hence for the choice of t = 2^ ^Jnp{l — p) in Theorem 3 we obtain that 



P 



Also note that 



|A(Gi,Gi,7A)-EA(Gi,Gi, 



> gnp(l - p) 



< 2e~"P(^~P)/'^°°. 



(5) 



{v,v'}e(Y^)\T{i>) : ip(v)y^v or i>{v')jtv' 

is the sum of n Bernoulli(2p(l — p)) random variables hence 

EA(G'i,Gi,V')=2np(l-p). (6) 

Because |T(-?/')| < |, we have by elementary algebra that ^"~ ' < n < nk. Thus, by (5) and (p) 
we obtain that (for large enough n; in the following our constants are very conservatively chosen) 

'A{Gi,Gi,ij) 



P 



^ [1/2, 5/2] < 2eTooo«'=p(i-p) < 2e 



-(i-;i) 



nkp 



(7) 



nkp{l — p) 

Conditioning on Gi, random variable A'^+(Gi, G2, ip) has aBinomial(A+(Gi, Gi, ip), (1 — p){l — q)) 
distribution, and random variable AP^{Gi,G2,'ip) has a Binomial(A"(G'i, Gi, ■?/'), p(l — g)) distri- 
bution. Conditioning also on the event that p^^ Vi-'T ^ [1/2; 5/2], applying Theorem 4 with the 



value i = 2 ■ ^^{Gi, Gi,ip), using (nj), we have that 



P 



A°+(Gi,G2,^) > A+(Gi,Gi,^) • ((1 -p)(l -q) + 1 
A°-(Gi,G2,V^) > A-(Gi,Gi,V^) ■ (p{l -q)+^- 



-(i-Si) 

< e 40 



nkpQ^ 



< e 40 



nkpfr 



(8) 
(9) 



Finally, the probability of T*" can be bounded using subadditivity, over all ip G n{n) besides e, 
on the events described and bounded in ^ and ([9]), as they each intersect the event described 



and bounded in ([T]) and its complement: 

n 



k=l ip&n{k) 



fc=i 



4 
n 






the last inequality holding if p > C2-^^ and f) > Ci^/-^^ for sufficiently large, fixed constants Ci, C2. 
Because ^^^ ^ < C)0, we have by the Borel-Cantelli Lemma that T almost always happens. As 
mentioned in Section 3^ with this fact the proof of the first part of Theorem [l] is complete. D 



Remark 6. Note that we could tighten the constants Ci and C2 appearing above. Here we choose 
not to, instead focusing on the orders of magnitude of g, and do not pursue exact constants further. 

3.4 Proof of Theorem [l], part ii) 

We now prove the second part of Theorem [1} 

Just for now, let ip G n{n) be chosen (i.e., ip is a derangement), and condition on A+(G'i, Gi,ip) - 
A, where |n^p(l— p) < A < ^n'^p{l—p). The random variables A°+(G'i, 6*2, '^) and A'^^(G'i, 6*2,^^) 
are independent, and have distributions Binomial ( A, gi) and Binomial (A, 52), respectively, where 
gi := (1 -p)il- 0) and ga := p(l - ^)- 

Denoting ri := gi + ^ and r2 := ^2+25 ^^^ observing that, under the hypotheses of Theorem [21 
part ii), it holds that ri < 1 — ^ and r2 < 1 — ^ we thus have by Proposition n^ that (as ^ > 2^) 



P|A°+(Gi,G2,V^)>A.ri and ^'-(G.^G.^i;) > A ■ r,] > ^ M^ ''^^^ ^^) ,-A-H(n...)-A./.(....) 
\ / 200A y rir2 

Note that we can change the inequalities ">" in the expression P( ) above into strict inequal- 
ities ">" with a harmless tweak. A first-order Taylor expansion of H{x + y,y) in x yields that 
H{x + y,y) < 2 (I- ) fo^ all < y < 1 and < a; < 1 — y. This, together with the fact that 
1 — ri = r2, 1 — r2 = ri and assuming that q is bounded away from 1 (which, indeed, will turn out 
to be assumed), we have that there exists a real number c > such that 

P( A°+(Gi,G'2,V^) > A-n and A°-(Gi,G2,^) > A-r2 

- 200A - n2 n? • ^ ^^ 

10 



From (1) and (7) we have that there exists a fixed constant C4 such that if p > 04-^^^ then, with 



n 



probabihty > | (for n large enough) it holds that ^n^p(l — p) < A+(G'i, Gi, ^) < |n^p(l ~ p) 



Thus, by (10), noting again that ri + r2 = 1 and that A^{Gi,Gi,iIj) = ^A(G'i, Gi, ^), we have 



unconditionally 

pfA°+(Gi,G2,^) + A°-(Gi,G2,^)>^-A(Gi,Gi,^)j > ^ . e^ (11) 

Next, the number of derangements \n{n)\ satisfies limn_i.oo !^ = \-, thus with Stirling's 
formula we have that for n large enough it will hold that \n{n)\ > (^) . Thus, for n large enough, 
by (§ and (0, 



E|{V^Gi7:A(Gi,G2,^)<A(Gi,G'2,e)}| = J] p( A(Gi,G2, V^) < A(Gi,G2,e) 



> Y. P(A(G'i,G2,V^)<A(Gi,G2,e)j 



C -g ^ +nlogn-n 

2n2 



so that there exists a fixed real number C3 > such that ii g < c^y -^^ then it holds that 
E| {-0 G n{n) : A(Gi, G2, ^) < A(Gi, G2, e)} | — t- 00 as n — )■ 00, and the second part of Theorem[l] 
is proven. D 

Remark 7. Note that we could tighten the constants C3 and C4 appearing above. Here we choose 
not to, instead focusing on the orders of magnitude of g, and do not pursue exact constants further. 

3.5 Overall argument of the proof for Theorem [2], part i) 

The proof of Theorem [2] is very similar in structure to the proof of Theorem [l] For simplicity of 
notation, suppose, without loss of generality, that the correlated Erdos-Renyi graphs Gi and G2 
are on the same set of n vertices V, and we do not relabel the vertices. Let U denote the set of 
bijections V ^>- V; here the identity function e G il is the alignment bijection. Further suppose 
that V is partitioned into s seed vertices U, and m nonseed vertices W. Let : f/ — )■ t/ be the 
identity function, and let U^ := {ip & U -.Mu & U ip{u) = u] . For any ip G i7<^, 
define /(Gi, G2,4') := {{{w, u) E W x U : w ~Gi u and ip{w) ~g2 ^II; 
define h~^{Gi,G2,ip) := \{{w,u) E W x U : w t^Gi u and ip^w) ~g2 "^II) 
define h~{Gi,G2,ip) := \{{w,u) E W x U : w ~Gi "" and ip{w) 7^g'2 '^}\y 
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define h^~^{Gi,G2,ip) := \{{w,u) E W x U : w t^Gi u and ipl^w) ~Gi u and ip^w) t^Gj u}|, 
define h^^{Gi,G2,ip) := \{{w,u) E W x U : w ~Gi u and '?/'(ti') t^Gi u and '^/'(i;;) ~g2 ^}|) 
and define h{Gi, G2, tp) := h+{Gi, G2, ip) + h-{Gi, G2, ifj). 
First note that 

h+iG,,G,,i^) = h-iG^,G,,ij) = hiG,,G,,^) ; 



'121 



this can be easily verified by considering, for each u E U and for each cycle C of the permutation 
ip, the changes of status in adjacency-to-w of the vertices as the vertices of G are considered in their 
cyclic order. (Specifically, the number of changes along G from adjacency-to-w to nonadjacency- 
to-M are equal to the number of changes along G from nonadjacency-to-w to adjacency-to-w.) 
Next, note that 



2 1 f{Gi, G2, e) - f{Gi, G2, ^) j = h{Gi, Gi, ^) - 2h'+{Gi, G2, ^) - 2/i°- (d, G2, ^) ; 



(13) 



this is easily verified by replacing "G2" in (13) with "G", and observing the truth of (13) as G 



starting out with G = Gi, is changed one edge-flip at a time until G = G2. 

Now, consider the event T defined as it holding that, for all ip G U^ besides e, 

/i°+(Gi,G2,^) < /i+(Gi,Gi,^)- ('(l-p)(l-^) + |) andalso 
/i°-(Gi,G2,^) < /i-(Gi,Gi,V^)-(p(l-f?) + |). 



(14) 
(15) 



We will show in Section 3.6 that, under the hypotheses of the first part of Theorem^ T almost 



always happens. Then, adding (14) to (15) and using (12), we then obtain that almost always 



/i°+(Gi,G2,V^) + /i°-(Gi,G2,V^) < l-h{GuGi,ip) for all V^ E n^\{e}. Substituting this into (13) 



yields that almost always /(Gi,G2,e) > f{Gi,G2,4') for all ip E i7^\{e}, and the first part of 
Theorem |2] will then be proven. 



3.6 Under hypotheses of Theorem 2, part i), T occurs almost always 

For any A; G {1,2, . . . ,m}, denote n^{k) := {ip E U^ : \{v E V : ip{v) 7^ v}\ = k}. Just for now, 
let A; G {1, 2, ... , m} be chosen, and let ip E n^{k) be chosen. The random variable h{Gi,Gi,ilj) is 
a function of the n := ks independent Bernoulli(p) random variables {l{w,u},Gi}iw,u)£WxU:i(>{w)^w, 
and note that the hypotheses of Theorem 3 are satisfied, hence for the choice of t = ^^ynp{l — p) 
in Theorem |3] we obtain that 



P 



|/i(Gi,Gi,^)-E/i(Gi,Gi,^)| > -np(l-p) 

5 



< 2e""^^^~^^/^°° 



(16) 
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Also note that 



h{Gi,Gi,i') 



E 



1/ 



is the sum of n Bernouni(2p(l — p)) random variables hence 

E/i(G'i,G'i,V) = 2np(l-p). 



l{u,,«},Gi)7^(l{V(t«),i*},Gi, 



(17) 



Thus, by (16) and (17) we obtain that 



P (' ^^1;^!'^) ^ [9/5^ 11/5] ] < 2e35l'=^p(i-p) < 2e^^^ 



^2 

400 ' 



ksp{l — p) 

Conditioning on Gi, random variable /i°+(G'i, G'2,'?/') has a Binomial(/z"''(Gi,Gi, '?/'), (1 — p)(l — g)) 
distribution, and random variable h^~{Gi,G2,ip) has a Binomial(^~(Gi, Gi, ^),p(l — g)) distri- 
bution. Conditioning also on the event that [.^^J^^ J G [9/5, 11/5], applying Theorem 4 with the 



value ^ = 2 ■ h~^{Gi,Gi,ilj), using (12), we have that 



P 



h"+iGuG2,^lj) > h^{Gi,G,,^) ■ (1 -p)(l -g) + 



-4 



< e 20 



ks 



P 



/i°-(Gi, G2, ^) > /i-(Gi, Gi, V-) ■ b(l - f?) + 



2/J 



< e 20 



■ks 



(19) 
(20) 



=rC 



Finally, the probability of T can be bounded using subadditivity, over all ip G Ilfj, besides e. 



on the events described and bounded in (19) and (20), as they each intersect the event described 



and bounded in (18) and its complement: 

m 



k=i xi,(in^(k) 



^k<i 



^4 _^4 



< 



E 

fc=i 

m 



m 



-Jl 



ks 



2e 400 '=■' + 2e 20 



A:=l 



< y^ I 2e «o^'^''+'^^°§'" + 26^'^'*"'"'^^°^'" I < 



m ■ 



m-' 



the last inequality holding if s > cslogm for sufficiently large, fixed constant C5. Because 
Y2m=i A < 00 we have by the Borel-Cantelli Lemma that T almost always happens. As mentioned 
in Section 3.5, with this fact the proof of the ffist part of Theorem |2] is complete. D 

Remark 8. We do not chase the exact constant C5 here, focusing on the order of magnitude of s 
instead. Also, if we allow p and p to vary with m, then a minor alteration of the above proof (and 
a tighter Chernoff-Hoeffding bound) yields the same conclusion as in Theorem M part i) if for (an 
arbitrary but) fixed < e < 2 and qi := (1 — p){l — g) and ^2 := p(l ~ g) 

2 2 16 



C5 := c^ip, g) > max 



H{q, + l,q,) ■ p{l - p){2 - ey H{q2 + I,q2) ■ p{l - p){2 - ey e^p{l - p) 



Details are left to the reader. 
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3.7 Proof of the Theorem [2] part ii) 

We now prove the second part of Theorem |2} 

Just for now, let ip G n{m) be chosen (i.e., none of the nonseeds are fixed points for ip), and 
condition on h^{Gi, Gi,ip) = L, where j^smp{l — p) < L < ^smp{l —p). The random variables 
/i°"^(G'i, 6*2) V^) and h^~'{Gi,G2,ip) are independent, and have distributions Binomial (L, gi) and 
Binomial(L, ^2), respectively, where gi := (1 — p){l — g) and g2 := p(l — ^)- 

Denoting ri := gi + f and r2 := g2 + f , we have by Proposition Isl that 



P /i"+(Gi,G2,^) >L-ri and /^""(G'l, ^2,^) > L ■ ra > 



qiq2 / (I -ri)(l -r2 ) _ 
200L V rira 



L-Hin,q^)-L-Hir2,q2) 



Considering the first order Taylor expansion of H{x + y, y) described in Section 3.4, we have that 



H{ri, qi) and H{ri, qi) are both bounded above by a constant. With the fact that 1 — ri = r2 and 
1 — r2 = ri, from the above we obtain that there is a positive real number c such that 



P|/i°+(G'i,G'2,^)>L-ri and h^-iGi,G2,i') > L ■ r2 ] > — ■ e-'f > —^ 

sm miogm 



e <= 



(21) 



under the hypotheses of the second part of Theorem |2} 

Next, \n^{m) I is the number of derangements of an m element set, and it satisfies limm-s>oo T 
^, thus with Stirling's formula we have that for m large enough it will hold that |i7(m)| > (™) . 



Thus, for m large enough, by (13) and (21), 



E| [ipen^: fiGu G2, ^) > /(Gi, G2, e)} 



= 5^p(/(Gi,G2,^)>/(Gi,G2,e)j 
> Yl P(/(G'i,G2,^)>/(Gi,G2,e)j 



> 



m\ 



e / m log m 



e <= 



„—^^^+in\osm—m 



mlogm 

so that there exists a fixed real number cg > such that if s < c^logm then it follows that 
E| {ip G n^ : /(Gi, G21 tp) > f{Gi, G2, e)} I — ;■ 00 as m — )• 00, and Theorem |2] part ii) is proven. D 

Remark 9. We could tighten the constant Cg here, but choose instead to focus on the order of 
magnitude of s. If we allow p and g to be functions of m, then a simple alteration of the above 
proof yields the same results of Theorem [2] part ii) if 

1 



C6 := ce{p,g) < 
again details are left to the reader. 



A[H{q, + l,q^) + Hiq2 + l,q2)]pil-py 
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4 SGM algorithm: extending approximate graph match- 
ing Frank- Wolfe methodology to a seed scenario 

The FAQ algorithm of Vogelstein et al. |I2j is an efficient, state-of-the-art approximate graph 
matching algorithm based on Frank- Wolfe methodology. It doesn't utilize seeds. In this section we 
describe the SGM algorithm from [6] , which extends the Frank- Wolfe methodology to incorporate 



utilization of seeds in approximate seeded graph matching, and in Section 4.3 we point out that 



the CNS algorithm from Section 2.3 (which was the subject of Theorem [2]) is inherently included 



in this Frank- Wolfe methodology when there are seeds. Indeed, The CNS segment of SGM helps 
to guide the SGM algorithm to a better performance than its unseeded counterpart. 

4.1 The Frank- Wolfe algorithm and Frank- Wolfe methodology 

First, a brief review of the Frank- Wolfe algorithm: The general optimization problem that the 
Frank- Wolfe algorithm is applied to is maximize f{x) such that x & S, where S" is a polyhedral set 
in a Euclidean space, and the function / : S — )■ M is continuously differentiable. The Frank- Wolfe 
algorithm is an iterative procedure. A starting point x^^-* G 5* is chosen in some fashion, perhaps 
arbitrarily. For i = 1,2,3, .. ., a Frank- Wolfe iteration consists of maximizing the ffist order (ie 
linear) approximation to / about x^''\ that is maximize f (x^'^^) + V f (x^'^^)'^ {x — x'^^^) over x E S, call 
the solution y^*^ (of course, this is equivalent to maximizing V/(x*-*^)^a; over x E S), then x'-*"'"^-' is 
defined to be the solution to maximize f{x) over x on the line segment from x*-*^ to y^'^\ Terminate 
the Frank- Wolfe algorithm when the the sequence of iterates x^^\x^'^\ . . . stops changing much. 

Of course, the Seeded Graph Matching Problem is a combinatorial optimization problem and, 
as such, the Frank- Wolfe algorithm cannot be directly applied. The term Frank- Wolfe method- 
ology will refer to the approach in which we relax integer constraints so that the domain is a 
polyhedral set and the Frank- Wolfe algorithm can be directly applied to the relaxation and, at 
the termination of the Frank- Wolfe algorithm we project the fractional solution to the nearest 
feasible integer solution. It is this projection that we adopt as an approximate solution to the 
original combinatorial optimization problem. We next describe the SGM algorithm, which applies 
Frank- Wolfe methodology to the Seeded Graph Matching Problem. 

4.2 The SGM algorithm 

Suppose Gi and G2 are graphs, say V{Gi) = {vi, t>2, . . . , f„} and ^(6*2) = {"^i, v'2, . . . , v'^}, and let 
A and B be the respective adjacency matrices of Gi and (j2- Suppose without loss of generality that 
Ui = {vi, V2, ■ ■ ■ , Vs} are seeds, and the seeding function : f/i — ;■ V{G2) is given by 0(f «) = v'^ for 
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alH = 1, 2, . . . , s. Denote the number of nonseed vertices m := n — s. Let A and B be partitioned 

B B^ 



A 



A A^ 
A A 



B 



B B 
where A,B e {0, l}^^^ A,B e {0, l}™x™, and A,B e {0, 1}™^ 



As mentioned in Section |2.3[ the Seeded Graph Matching Problem is precisely to minimize 
\\A - (/ © P)B{I © Pffp = \\A\\j, + \\B\\l, - 2 • traceA^(/ © P)B{I © Pf over all m-hy-m 
permutation matrices P. Relax this maximization of trace A-^(/ (B P)B{I (B P)''" over all m-hy-m 
permutation matrices P to the maximization of traceA"^(/© P)i?(/©P)"^ over all m,-hy-m doubly 
stochastic matrices P (which form a polyhedral set), and then the Frank- Wolfe algorithm can be 
applied directly to the relaxation. Simplification yields the objective function 

/(P) = tracelP + trace^^P-B + trace^S^P^ + traceiPPP^ (22) 

= tracelP + 2 ■ traceP^^iS^ + traceiPPP^ 

which has gradient 

V(P) = 2-AB'^ + 2-APB. 

We start the Frank- Wolfe algorithm at an arbitrarily selected doubly stochastic m-by-m matrix 
P^^^; for convenience we use the "barycenter" matrix P*^^^ with all entries equal to — . Then, for 
successive z = 1, 2, . . ., the Frank- Wolfe iteration is to maximize the inner product of P with the 
gradient of / at P*^*^ over all m.-hy-m, doubly stochastic matrices matrices P; this maximization 
problem is (ignoring a benign factor of 2) maximizing trace P^ {AB^ + AP^^^ B) over m.-hy-m, doubly 
stochastic matrices, and this is a linear assignment problem since the optimal P in this subproblem 
must be a permutation matrix (by the Birkhoff-VonNeuman Theorem which states that the m-hy- 
m doubly stochastic matrices are precisely the convex hull of the m-hy-m permutation matrices), 
and the linear assignment problem can be solved efficiently with the Hungarian Algorithm in 
0{mi^) time. Say the optimal value of P in this subproblem is F*^*-*; then, the function / on the 
line segment from P*^*^ to y^*^ is a quadratic that is easily maximized exactly, with p(*+i) defined 
as the doubly stochastic matrix attaining this maximum. 

When the Frank- Wolfe iterates P^^\ P^'^\P^^\ . . . stop changing much (or a constant maximum 
of iterations are performed — we allowed 20 iterations) then the Frank- Wolfe algorithm terminates, 
say the resultant approximate solution to the relaxed problem is the doubly stochastic matrix Q. 
The final step is to project Q to the nearest m-hy-m permutation matrix. Minimizing ||P — QHf 
over permutation matrices P is again a linear assignment problem solvable in 0{m?) time; indeed, 
minimizing ||P — QH/^ = IIPHI^ — 2- traceP-^Q + \\Q\\']? is equivalent to maximizing trace P^Q 
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over permutation matrices P. This optimal permutation matrix P is adopted as the approximate 
solution to the seeded graph matching problem. Specifically, the algorithm output is the bijection 
ip : V{Gi) — 7- V{G2) where, for z = 1, 2, . . . , s, ip{vi) = v[ and, for each i = 1, 2, . . . , m, ip{vs+i) = 
v'g^j for the j such that Pij = 1. This Frank- Wolfe Methodology approach described above is 
called the SGM algorithm. 

The running time for the SGM algorithm is O(n^). This is because of the linear assignment 
problem formulation and the use of the Hungarian algorithm in each Frank- Wolfe iteration, and 
is a huge savings over using the simplex method or an interior point method for solving the 
linearizations in each Frank- Wolfe iteration. This trick has made Frank- Wolfe methodology a 
potent weapon for efficient approximate graph matching. 

4.3 The CNS algorithm is embedded in the SGM algorithm 



In each Frank- Wolfe iteration (described in Section 4.2), the linearization which is solved is max 



imize (trace P^AB^ + tiaceP^ AP^'^^ B) over all m-hj-m permutation matrices P. Observe that 



this maximization just over the first term is precisely the CNS algorithm of Section |2.4[ in this 
manner CNS algorithm is naturally embedded in Frank- Wolfe methodology when there are seeds. 
Also, the maximization in each Frank- Wolfe iteration just over the second term is precisely what 
would be employed in the FAQ algorithm [12] in the absence of all of the seeds. In this manner, 
the SGM algorithm can be seen as leveraging a combination of the information gleaned from the 
nonseed-seed relationships (the CNS term) and the nonseed-nonseed relationships (the FAQ term). 

Although the simpler CNS algorithm almost always yields the desired alignment in the presence 
sufficiently many seeds, the SGM algorithm, in utilizing the relationships between the unseeded 
vertices, is often more effective at estimating the true alignment function; see Figure [T| There 
we compare the performance of SGM with that of CNS for correlated Erdos-Renyi graphs with 
n = 300 vertices, p = 0.5, seeding levels ranging from s = to 275, and correlation ranging 
from g = 0.1 to 1. For each value of g and s, we ran 100 simulations and plotted the fraction of 
nonseeded vertices correctly matched across the graphs, with corresponding error bars of ±2 s.e. 

Also note the following from Figure fl} When there are no seeds, we see FAQ (which is SGM 
in the absence of seeds) working perfectly at capturing the true alignment function when the 
two graphs are isomorphic (it bears noting that we have also observed FAQ perfectly matching 
when the two graphs are not isomorphic but rather *very* highly correlated), but FAQ does a 
surprisingly poor job (indeed, comparable to chance) when the correlation is even modestly less 
than one. However, with seeds, SGM quickly does a very substantially better job; indeed, the 
seeding term of the CNS algorithm is "steering" the SGM algorithm in the proper direction! 
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Figure 1: Fraction of vertices correctly matched for the SGM algorithm and the CNS algorithm, 
plotted versus the number of seeds utilized, for n = 300, p = 1/2 and correlation g varying from 
0.1 to 1. For each value of g and s, we ran 100 simulations and plot the fraction of nonseeded 
vertices correctly matched across the graphs, with corresponding error bars of ±2 s.e. 
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